Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure arbitrary frozen modules via config #869

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lkhphuc
Copy link
Contributor

@lkhphuc lkhphuc commented Feb 20, 2025

I saw there is an issue about this #306 but it has not been implemented yet so I created this PR.

It allow you to specify a list of modules to be frozen via config file or command line, like: --model.frozen_modules='tok_embeddings,layers.0.attention'

Also print the number of frozen and trainable parameters during initialization.

  • No frozen modules.
    Screenshot 2025-02-20 at 16 36 26
  • [model] frozen_modules = 'tok_embeddings,tik_embeddings,layers.0.attention'
    Screenshot 2025-02-20 at 16 40 45

Hope this help.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 20, 2025
@fegin
Copy link
Contributor

fegin commented Feb 20, 2025

I'm not sure if this should be a configurable option. Instead, if a model requires some parts to be frozen, it should be coded in the model. And our trainer should be able to support different use cases, including parts of parameters are frozen and load the checkpoints correctly.

@tianyu-l
Copy link
Contributor

Whether or not this should be configurable via toml depends on the use case. As @fegin pointed out, in most use cases we've heard, e.g. freezing certain parts (such as image encoder) of a multimodal/diffusion model, it will be a static decision and the model code should handle it. It should only be configurable if the training parameters would change / shift in the training process. Can you give some examples of such cases?

We will anyway need this mechanism down the road, but probably first via a util function (called from model code), similar to this function in torchtune.
https://github.com/pytorch/torchtune/blob/main/torchtune/modules/peft/_utils.py#L65

@lkhphuc
Copy link
Contributor Author

lkhphuc commented Feb 21, 2025

Understand if you would want this in the model code. But my use cases are a bit more dynamic so I implement this in a generic manner.
This way there is one less thing to duplicate when we switch between different model.s
Even for training the same Multimodal model, there are usually multiple stages of training given the same modelling code.
For example some actual scenarios:

  • Freeze both encoder, decoder, only train projector/connector.
  • Freeze everything except Patch Embed to adapt data domain.
  • Freeze decoder, finetune encoder + projector etc.

including parts of parameters are frozen and load the checkpoints correctly.

This case is indeed cleaner to do in the model code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants